Access network for address mapping in non-volatile memories

ABSTRACT

Systems and methods for determining a physical block address (PBA) of a non-volatile memory (NVM) to enable a data access of a corresponding logical block address (LBA) are described. One such method includes generating a first physical block address (PBA) candidate from a LBA using a first function; generating a second physical block address (PBA) candidate from the LBA using a second function; and selecting either the first PBA candidate or the second PBA candidate for the data access based on information related to a background swap of data stored at the first PBA candidate and a background swap of data stored at the second PBA candidate.

CROSS-REFERENCE TO RELATED APPLICATION(S)

This application claims priority to and the benefit of U.S. ProvisionalApplication No. 62/360,916, filed on Jul. 11, 2016, having AttorneyDocket No. HGST-1011PROV (H20161077) and entitled, “GENERATION OF RANDOMADDRESS MAPPING IN NON-VOLATILE MEMORIES USING LOCAL AND GLOBALINTERLEAVING”, and is a continuation in part of U.S. patent applicationSer. No. 14/967,169, filed on Dec. 11, 2015, having Attorney Docket No.HGST-1003 (H20151149US2) and entitled, “GENERATION OF RANDOM ADDRESSMAPPING IN NON-VOLATILE MEMORIES USING LOCAL AND GLOBAL INTERLEAVING”,which claims priority to and the benefit of U.S. Provisional ApplicationNo. 62/192,509, filed on Jul. 14, 2015, having Attorney Docket No.HGST-1003P (H20151149) and entitled, “SYSTEMS AND METHODS FOR PROVIDINGDYNAMIC WEAR LEVELING IN NON-VOLATILE MEMORIES”, the entire content ofeach application referenced above is incorporated herein by reference.

FIELD

Aspects of the disclosure relate generally to mapping memory addresses,and more specifically, to address mapping in non-volatile memories.

BACKGROUND

In a variety of consumer electronics, solid state drives incorporatingnon-volatile memories (NVMs) are frequently replacing or supplementingconventional rotating hard disk drives for mass storage. Thesenon-volatile memories may include one or more flash memory devices, theflash memory devices may be logically divided into blocks, and each ofthe blocks may be further logically divided into addressable pages.These addressable pages may be any of a variety of sizes (e.g., 512Bytes, 1 Kilobytes, 2 Kilobytes, 4 Kilobytes), which may or may notmatch the logical block address sizes used by a host computing device.

During a write operation, data may be written to the individualaddressable pages in a block of a flash memory device. However, in orderto erase or rewrite a page, an entire block must typically be erased. Ofcourse, different blocks in each flash memory device may be erased moreor less frequently depending upon the data stored therein. Thus, sincethe lifetime of storage cells of a flash memory device correlates withthe number of erase cycles, many solid state drives performwear-leveling operations (both static and dynamic) in order to spreaderasures more evenly over all of the blocks of a flash memory device.

To make sure that all of the physical pages in a NVM (e.g., flash memorydevice) are used uniformly, the usual practice is to maintain a tablefor the frequency of use for all of the logical pages and periodicallymap the most frequently accessed logical address to physical lines.However, these table indirection based methods incur significantoverhead in table size. For instance to use a table approach for a 2terabyte (TB) storage device with 512 byte pages, a 137 gigabyte (GB)table would be needed. This is clearly not practical.

SUMMARY

In one aspect, the disclosure provides a method for determining aphysical block address (PBA) of a non-volatile memory (NVM) to enable adata access of a corresponding logical block address (LBA), the methodcomprising: generating a first physical block address (PBA) candidatefrom a LBA using a first function; generating a second physical blockaddress (PBA) candidate from the LBA using a second function; andselecting either the first PBA candidate or the second PBA candidate forthe data access based on information related to a background swap ofdata stored at the first PBA candidate and a background swap of datastored at the second PBA candidate.

In another aspect, the disclosure provides a system for determining aphysical block address (PBA) of a non-volatile memory (NVM) to enable adata access of a corresponding logical block address (LBA), the systemcomprising: a first network configured to generate a first PBA candidatefrom a LBA using a first function; a second network configured togenerate a second PBA candidate from the LBA using a second function;and a select logic configured to select either the first PBA candidateor the second PBA candidate for the data access based on informationrelated to a background swap of data stored at the first PBA candidateand a background swap of data stored at the second PBA candidate.

Another aspect of the disclosure provides a system for determining aphysical block address (PBA) of a non-volatile memory (NVM) to enable adata access of a corresponding logical block address (LBA), the systemcomprising: means for generating a first PBA candidate from a LBA usinga first function; means for generating a second PBA candidate from theLBA using a second function; and means for selecting either the firstPBA candidate or the second PBA candidate for the data access based oninformation related to a background swap of data stored at the first PBAcandidate and a background swap of data stored at the second PBAcandidate.

BRIEF DESCRIPTION OF THE DRAWINGS

FIG. 1 is a block diagram of a solid state device (SSD) that can performlocal address mapping in accordance with one embodiment of thedisclosure.

FIG. 2 is a block diagram of a system for performing local addressmapping including an access network and a cumulative state computationblock that can be used to map logical block addresses (LBAs) to physicalblock addresses (PBAs) in accordance with one embodiment of thedisclosure.

FIG. 3 is a block diagram of an access network, including a select logicblock that can be used in the address mapping system of FIG. 2, to map aLBA to a PBA in accordance with one embodiment of the disclosure.

FIG. 4 is a flow chart of a process for mapping a LBA to a PBA inaccordance with one embodiment of the disclosure.

FIGS. 5-8 are diagrams of exemplary physical block addresses at discretetimes illustrating operation of the select logic on mapping LBAs to PBAsfor example values of the PBAs and move index variables in accordancewith one embodiment of the disclosure.

FIG. 9 is a block diagram of a cumulative state computation blockincluding a bitonic network and a bitonic sorter that can be used in theaddress mapping system of FIG. 2 in accordance with one embodiment ofthe disclosure.

FIG. 10 is a diagram of a bitonic sorter including a sorter table andcomparison type table in accordance with one embodiment of thedisclosure.

FIG. 11 is a block diagram of another system for local address mappingincluding an access network and one or more read-only memories (ROMs)for storing pre-calculated cumulative state values in accordance withone embodiment of the disclosure.

FIGS. 12a, 12b, 12c are schematic diagrams of ROMs for storing controlstate values, cumulative control state values, and use indicators thatcan be used in the system of FIG. 11 in accordance with one embodimentof the disclosure.

FIG. 13 is a block diagram of another access network, including a selectlogic block that can be used in the address mapping system of FIG. 11,to map a LBA to a PBA in accordance with one embodiment of thedisclosure.

FIG. 14 is a block diagram of indirection table in accordance with oneembodiment of the disclosure.

FIG. 15 is a block diagram of a general system for performing randomaddress mapping using local and global interleaving in accordance withone embodiment of the disclosure.

FIG. 16 is a flow chart of a process for performing random addressmapping using global mapping and local interleaving in accordance withone embodiment of the disclosure.

FIG. 17 is a block diagram of a system for performing random addressmapping with bit inverse for global mapping (G bits) and permutation forlocal interleaving (N−G bits) in accordance with one embodiment of thedisclosure.

FIG. 18 is a table illustrating a numerical example of global mappingusing bit inverse on G bits in accordance with one embodiment of thedisclosure.

FIG. 19 is a table illustrating a numerical example of localinterleaving using a permutation on N−G bits in accordance with oneembodiment of the disclosure.

FIG. 20 is a table illustrating a numerical example of global mappingusing bit inverse and local interleaving using permutation in accordancewith one embodiment of the disclosure.

FIG. 21 is a block diagram of a multi-stage interconnection network(MIN) that can be used to perform local interleaving in accordance withone embodiment of the disclosure.

FIG. 22 is a block diagram of a butterfly MIN that can be used toperform local interleaving in accordance with one embodiment of thedisclosure.

FIG. 23 is a block diagram of a Benes MIN that can be used to performlocal interleaving in accordance with one embodiment of the disclosure.

FIG. 24 is a block diagram of a Omega MIN that can be used to performlocal interleaving in accordance with one embodiment of the disclosure.

FIG. 25 shows a block diagram of a modified Omega MIN that can be usedto perform local interleaving in accordance with one embodiment of thedisclosure.

DETAILED DESCRIPTION

Referring now to the drawings, systems and methods for mapping logicalblock addresses (LBAs) to physical block addresses (PBAs) fornon-volatile memories (NVMs) are illustrated. One such method involvesdetermining a physical block address (PBA) of a non-volatile memory(NVM) to enable a data access of a corresponding logical block address(LBA), and includes (1) generating a first physical block address (PBA)candidate from a LBA using a first function, (2) generating a secondphysical block address (PBA) candidate from the LBA using a secondfunction, and (3) selecting either the first PBA candidate or the secondPBA candidate for the data access based on information related to abackground swap of data stored at the first PBA candidate and abackground swap of data stored at the second PBA candidate. In oneexample, the first function and/or the second function may include afunction performed by at least one of a multi-stage interconnectionnetwork or a block cipher. In another example, the first function and/orthe second function may further include an exclusive OR function.

Embodiments of these mapping systems and the corresponding methods mayinvolve substantially less hardware, and more specifically, less storageto manage mapping LBAs to PBAs than say the indirection tables discussedabove. Moreover, these mapping systems and methods may work well inconjunction with random address mapping in non-volatile memories usinglocal and global interleaving as are illustrated in FIGS. 15-25 anddiscussed in detail below.

FIG. 1 is a block diagram of a solid state device (SSD) that can performlocal address mapping in accordance with one embodiment of thedisclosure. The system 100 includes a host 102 and a SSD storage device104 coupled to the host 102. The host 102 provides commands to the SSDstorage device 104 for transferring data between the host 102 and theSSD storage device 104. For example, the host 102 may provide a writecommand to the SSD storage device 104 for writing data to the SSDstorage device 104 or read command to the SSD storage device 104 forreading data from the SSD storage device 104. The host 102 may be anysystem or device having a need for data storage or retrieval and acompatible interface for communicating with the SSD storage device 104.For example, the host 102 may a computing device, a personal computer, aportable computer, or workstation, a server, a personal digitalassistant, a digital camera, a digital phone, or the like.

The SSD storage device 104 includes a host interface 106, a controller108, a memory 110, and a non-volatile memory 112. The host interface 106is coupled to the controller 108 and facilitates communication betweenthe host 102 and the controller 108. Additionally, the controller 108 iscoupled to the memory 110 and the non-volatile memory 112. The hostinterface 106 may be any type of communication interface, such as anIntegrated Drive Electronics (IDE) interface, a Universal Serial Bus(USB) interface, a Serial Peripheral (SP) interface, an AdvancedTechnology Attachment (ATA) interface, a Small Computer System Interface(SCSI), an IEEE 1394 (Firewire) interface, or the like. In someembodiments, the host 102 includes the SSD storage device 104. In otherembodiments, the SSD storage device 104 is remote with respect to thehost 102 or is contained in a remote computing system coupled incommunication with the host 102. For example, the host 102 maycommunicate with the SSD storage device 104 through a wirelesscommunication link.

The controller 108 controls operation of the SSD storage device 104. Invarious embodiments, the controller 108 receives commands from the host102 through the host interface 106 and performs the commands to transferdata between the host 102 and the non-volatile memory 112. Thecontroller 108 may include any type of processing device, such as amicroprocessor, a microcontroller, an embedded controller, a logiccircuit, software, firmware, or the like, for controlling operation ofthe SSD storage device 104.

In some embodiments, some or all of the functions described herein asbeing performed by the controller 108 may instead be performed byanother element of the SSD storage device 104. For example, the SSDstorage device 104 may include a microprocessor, a microcontroller, anembedded controller, a logic circuit, software, firmware, or any kind ofprocessing device, for performing one or more of the functions describedherein as being performed by the controller 108. In some embodiments,one or more of the functions described herein as being performed by thecontroller 108 are instead performed by the host 102. In someembodiments, some or all of the functions described herein as beingperformed by the controller 108 may instead be performed by anotherelement such as a controller in a hybrid drive including bothnon-volatile memory elements and magnetic storage elements.

The memory 110 may be any memory, computing device, or system capable ofstoring data. For example, the memory 110 may be a random-access memory(RAM), a dynamic random-access memory (DRAM), a static random-accessmemory (SRAM), a synchronous dynamic random-access memory (SDRAM), aflash storage, an erasable programmable read-only-memory (EPROM), anelectrically erasable programmable read-only-memory (EEPROM), or thelike. In various embodiments, the controller 108 uses the memory 110, ora portion thereof, to store data during the transfer of data between thehost 102 and the non-volatile memory 112. For example, the memory 110 ora portion of the memory 110 may be a cache memory.

The non-volatile memory (NVM) 112 receives data from the controller 108and stores the data. The non-volatile memory 112 may be any type ofnon-volatile memory, such as a flash storage system, a solid statedrive, a flash memory card, a secure digital (SD) card, a universalserial bus (USB) memory device, a CompactFlash card, a SmartMediadevice, a flash storage array, or the like.

The controller 108 or NVM 112 can be configured to perform any of thelocal address mapping schemes described herein.

One way to address the large indirection table issue discussed in thebackground section above for page based NVMs is to improve the processof mapping logical pages to physical pages, and more specifically, theprocess for mapping logical block addresses (LBAs) to physical blockaddresses (PBAs).

Local Address Mapping

FIG. 2 is a block diagram of a system 200 for performing local addressmapping including an access network 202 and a cumulative statecomputation block 204 that can be used to map logical block addresses(LBAs) to a physical block addresses (PBAs) in accordance with oneembodiment of the disclosure. The system 200 further includes an initialand second memory map block 206, a background swap scheduler 208, and amapping state generation and change block 210. In one aspect, the accessnetwork 202 can be implemented in hardware (e.g., ultra-low latency with3 cycle pipeline delay with low logic and memory equivalent of less than10,000 logic gates) and the remaining components of the system 200 canbe implemented in firmware.

The access network 202, which will be discussed in greater detail below,receives the latest two cumulative control states in CCS1 and CCS2 fromthe cumulative control state block 204 along with a move index from thebackground swap scheduler 208. Using these inputs, the access network202 can determine which physical block address (PBA) a given logicalblock address (LBA) is mapped to using two slave networks (e.g., bitonicor Benes networks) that each receive one of the two cumulative controlstates to generate a possible mapping.

The cumulative state computation block 204, which will be discussed ingreater detail below, initially receives control states in cs1 and cs2and CCS1 from the initial and second memory map block 206. In oneaspect, the initial control states may have random values and CCS1 maybe set to cs1. After an initial period, the cumulative state computationblock 204 may receive these inputs from the mapping state generationchange block 210. Using these inputs, the cumulative state computationblock 204 can determine a second cumulative control state, CCS2, whichis a function of CCS1 and cs2. The control states, cs1 and cs2, can beused as inputs to a master bitonic network, or another suitable network,and ultimately to determine the second cumulative control state, CCS2.The cumulative control states, CCS1 and CCS2, can be used by the accessnetwork 202 to determine current LBA to PBA mappings. In one aspect, thecumulative state may be computed in firmware using the master bitonicnetwork when the system changes the mapping periodically once the systemcompletes all the transfers in the background. The background moves canbe scheduled in firmware with another bitonic network using the newcontrol state (e.g., cs2).

In several applications such as dynamic wear leveling, which changes itsrandom memory map from LBA to PBA on a periodic basis, the system 200may need to compute a cumulative random mapping at any given time pointso that a given LBA can be precisely located at a correct PBA. In oneexample, assume a random map of memory of size 2̂32 with a mappingfunction f1(t1) at time t1, a random map of memory of size 2̂32 with amapping function f2 at time t2, a random map of memory of size 2̂32 witha mapping function f3 at time t3, . . . , and a random map of memory ofsize 2̂32 with a mapping function fn at time tn. In operation, the system200 can compute a cumulative function (cfn) at time tn, such thatcfn=fn(cfm), and where cfm is cumulative function at time tm andtm=tn−1. In one aspect, the system 200 can generate a random mappingfunction (fn) using a bitonic network and a random control switch seed(e.g., using the cumulative state computation block 204). The bitonicnetwork can be configured to provide the random mapping function (fn)using a random control switch seed (e.g., cs1, cs2, . . . , csn). Thecumulative function (cfn) can now be passed through a master bitonicsorter and the control switch positions are recorded in the sortingprocess. These control switch positions, CCSn, can now be used toprogram a bitonic network with a data width of 1 and a network size of32 to generate cumulative random mapping for 2̂32 entries (e.g., usingaccess network 202). At any time, any of 2̂32 entries can be passedthrough this network to generate a permuted address. These operationswill be described in greater detail below.

The background swap scheduler 208 is configured to perform periodicswaps of data stored at preselected PBAs. In one aspect, the backgroundswap scheduler 208 may be configured to perform one swap per every 100host writes. In another aspect, the background swap scheduler 208 may beconfigured to perform one swap per every X host writes, where X is apositive integer. In one aspect, the background swap scheduler 208 isconfigured to perform moves according to a new map for two pages (swap)and thus moves are scheduled for every 200 host writes. The backgroundswap scheduler 208 may maintain a move counter which may be incrementedby 1 for every 200 host writes. In one aspect, moves are done instructured fashion on the physical memory using a lookup of a bitonicnetwork using the new control state (e.g., cs2). In one aspect, the movecounter (e.g., move index) gets incremented from 1 to N/2. The movecounter can also be referred to as move index, move_index, MOVE_INDEX,move_counter, and move counter. For each value, a swap is scheduled suchthat physical memory at the move counter gets swapped with the physicalmemory. In one embodiment, for example, the background swap scheduler208 can perform the swap as follows:

Physical addr1=MOVE_INDEX;

Physical addr2=f_cs2(Physical_addr1);

SWAP(Physical Addr1, Physical Addr2)

In such case, f_cs2 is a resulting random mapping function based oncontrol state cs2. The determination of cs2 is described in greaterdetail below in the discussion of FIG. 9. In one example, cs2 can be arandomly generated bit sequence of length 320 bits for a bitonic networkwith 32 inputs and 32 outputs.

In one embodiment, the MOVE_INDEX is set to 0 in the initial memory andsecond memory map block 206 and also in the mapping state generation andchange block 210. In the background swap scheduler 208 the MOVE_INDEXcan be incremented by 1 for an arbitrary number of host writes (e.g.,per every 100 host writes as in FIG. 2 or per 200 host writes or anothersuitable number of host writes). In another embodiment, the MOVE_INDEXincrement logic can be implemented in hardware as it may be easier tokeep track of the host writes in hardware. In such case, MOVE_INDEX canbe communicated from a new hardware logic block that implements theMOVE_INDEX increment logic to the background swap scheduler 208 anddirectly communicates MOVE_INDEX to the access network block 202 insteadof being communicated from the background swap scheduler 208 (e.g.,firmware) to the access network 202 (e.g., hardware).

In one aspect, these operations of the background swap scheduler 208 mayresult in a 1 percent write amplification. In one aspect, the swapoperation is assumed to be atomic.

The mapping state generation and change block 210 is configured toupdate control states and cumulative control states once all of the swaptransfers are complete. In one aspect, when the move index is equal toN/2, then all of the swap transfers from the previous map to the currentmap should be complete. Once completed, the mapping state generation andchange block 210 can then generate a new map. In one aspect, the movecounter (e.g., move index) can be reset (e.g., to 0 or 1). Whenever themapping change is done, cumulative control states can be computed infirmware and can be supplied to hardware. These values can be scheduleda little in advance in the firmware (e.g., in the mapping stategeneration and change block 210) to ensure timely communication to thehardware (e.g., access network 202). In one aspect, the old controlstate (cs1) may be set to the new control state (cs2), and the oldcumulative control state (CCS1) may be set to the new cumulative controlstate (CCS2).

Aspects of the access network 202 and the cumulative state computationblock 204 will be discussed in greater detail below.

FIG. 3 is a block diagram of an access network 300, including a selectlogic block 302 that can be used in the address mapping system of FIG.2, to map a LBA to a PBA in accordance with one embodiment of thedisclosure. In one aspect, the access network 300 can be used in thesystem of FIG. 2 as access network 202. The system 300 further includesa first bitonic network 304 and a second bitonic network 306. The firstbitonic network 304 can receive the LBA and new cumulative control state(CCS2) and generate a second possible physical block address (PBA2).Similarly, the second bitonic network 306 can receive the LBA and oldcumulative control state (CCS1) and generate a first possible physicalblock address (PBA1). The select logic 302 can then analyze thelocations of the possible PBAs in the page to determine which one iscorrect mapping using a preselected algorithm. More specifically, theselect logic 302 can compare PBA2 to the number of PBAs in the page (N)divided by 2 (e.g., N/2). If PBA2 is less than N/2, then a temporaryvariable (Pba_mc) is set to PBA2. Otherwise, Pba_mc is set to PBA1. IfPba_mc is less than the move index (MOVE_INDEX) from the background swapscheduler 208 of FIG. 2, then the correct PBA (e.g., output PBA) isPBA2. Otherwise, the correct PBA is PBA1. The operation of the selectlogic 302 will be described further below.

In one aspect, the select logic block 302 can effectively determinewhich of two possible PBAs (e.g., PBA1 and PBA2) contains the actualdata that corresponds to the LBA of interest. This determination isbased on a mid-point of the PBAs in the page (e.g., N/2) and the moveindex. In comparing the addresses of PBA1 and PBA2 to the mid-point andmove index, the select logic block 302 effectively determines which ofthe two PBAs contains the actual data that corresponds to the LBA ofinterest at a given time. For example, in FIG. 5, which will bediscussed in greater detail below, LBA 9 is stored in PBA 3 at timeperiod CF0, in PBA 11 at CF1, in PBA 8 at CF2, in PBA 14 at CFn-1, andin PBA 4 at CFn. The system can keep track of the last two possiblelocations, PBA 14 and PBA 4, which are the outputs of the ccs1 and ccs2functions. The select logic block 302 can then exactly determine whetherthe data related to LBA 9 is still there at PBA 14 or moved to PBA 4.

In one aspect, the first bitonic network 304 and the second bitonicnetwork 306 can be replaced with a first network and a second network,respectively. In such case, the first network can be configured togenerate a first PBA candidate from a LBA using a first function, andthe second network can be configured to generate a first PBA candidatefrom a LBA using a second function. In one aspect, the first functionand/or the second function may be a function performed by a multi-stageinterconnection network and/or a block cipher. The multi-stageinterconnection network may be implemented with one or more of a Benesnetwork, an inverse Benes network, a Bitonic network, an inverse Bitonicnetwork, an Omega network, an inverse Omega network, a Butterflynetwork, or an inverse Butterfly network. In one aspect, the firstfunction and/or the second function may include an exclusive OR functionand a function performed by a multi-stage interconnection network and/ora block cipher.

In one aspect, any one of the select logic 302, the first bitonicnetwork 304, and/or the second bitonic network 306 can be a specialpurpose processor or other suitable hardware specifically (such as anapplication specific integrated circuit or other hardware describedabove) configured/programmed to perform any of the functions containedwithin the application, such as the functions illustrated in FIG. 4.

FIG. 4 is a flow chart of a process 400 for mapping a LBA to a PBA inaccordance with one embodiment of the disclosure. In one embodiment, theprocess 400 can be performed by the access network 300 of FIG. 3, or anyof the other local address mapping systems described herein. In block402, the process generates a first physical block address (PBA)candidate from a LBA using a first function. In one aspect, the firstfunction may be a function performed by the first network (e.g., firstbitonic network 304 of FIG. 3) as described above. In certain aspects,the actions of block 402 may be effectuated with the controller 108, orwith the controller 108 in combination with the host 102 as illustratedin FIG. 1. In certain aspects, block 402 may be effectuated with thefirst bitonic network 304 of FIG. 3, the second bitonic network 306 ofFIG. 3, the select logic 302 of FIG. 3, the controller 108 of FIG. 1,and/or any combination of those components. In one aspect, block 402 maybe effectuated with the first bitonic network 304. In one aspect, block402 may represent one means for generating a first PBA candidate from aLBA using a first function.

In block 404, the process generates a second physical block address(PBA) candidate from the LBA using a second function. In one aspect, thesecond function may be a function performed by the second network (e.g.,second bitonic network 306 of FIG. 3) as described above. In certainaspects, the actions of block 404 may be effectuated with the controller108, or with the controller 108 in combination with the host 102 asillustrated in FIG. 1. In certain aspects, block 404 may be effectuatedwith the first bitonic network 304 of FIG. 3, the second bitonic network306 of FIG. 3, the select logic 302 of FIG. 3, the controller 108 ofFIG. 1, and/or any combination of those components. In one aspect, block404 may be effectuated with the second bitonic network 306. In oneaspect, block 404 may represent one means for generating a second PBAcandidate from a LBA using a second function.

In block 406, the process selects either the first PBA candidate or thesecond PBA candidate for the data access based on information related toa background swap of data stored at the first PBA candidate and abackground swap of data stored at the second PBA candidate. In oneaspect, the process selection may be performed by the select logic 302of FIG. 3. In certain aspects, the actions of block 406 may beeffectuated with the controller 108, or with the controller 108 incombination with the host 102 as illustrated in FIG. 1. In certainaspects, block 406 may be effectuated with the select logic 302 of FIG.3, the controller 108 of FIG. 1, and/or any combination of thosecomponents. In one aspect, block 406 may be effectuated with the selectlogic 302. In one aspect, block 406 may represent one means forselecting either the first PBA candidate or the second PBA candidate forthe data access based on information related to a background swap ofdata stored at the first PBA candidate and a background swap of datastored at the second PBA candidate.

In one aspect, the information related to the background swap of datastored at the first PBA candidate and the background swap of data storedat the second PBA candidate includes a status of the background swap ofdata stored at the first PBA candidate and a status of the backgroundswap of data stored at the second PBA candidate. In one aspect, thefirst PBA candidate and the second PBA candidate may be contained withina PBA map. In such case, examples of the status data may include aposition of the second PBA candidate relative to a midpoint of allentries in the PBA map, a PBA move counter based on the position of thesecond PBA candidate, and/or a move index indicative of a currentposition of PBA swaps within the PBA map. Examples of the selectionprocess and the use of the mapping status data will be described infurther detail below.

In one aspect, the process 400 can also include mapping a portion of aphysical address space containing the selected PBA candidate to anotherportion of the physical address space using at least one of a backgrounddata move or a background data swap. In one aspect, this mapping can beperformed by the background swap scheduler 208 of FIG. 2.

In an alternative embodiment, the selecting either the first PBAcandidate or the second PBA candidate can be performed using a memorytable (see for example system 1100 of FIG. 11 that may store variouscontrol states in a ROM or other suitable memory).

In one aspect, the process enables data access of an NVM, where the dataaccess may be a read access or a write access.

FIGS. 5-8 are diagrams of exemplary physical block addresses at discretetimes illustrating operation of the select logic on mapping LBAs to PBAsfor example values of the PBAs and move index variables in accordancewith one embodiment of the disclosure.

FIG. 5 illustrates operation of the select logic with example values ofthe PBAs and move index variables where the first condition (e.g.,PBA2<N/2) is satisfied and the second condition (e.g.,PBA_mc<move_index) is not satisfied such that the correct PBA is PBA1 orslot 14. The diagram 500 shows the physical block address (PBA) memorymaps at different time stages (e.g., CF0 to CFn). The select logicoperates using the last two memory maps (CFn and CFn-1). Input variablesinclude the move index (move_index=2), the number of entries in the PBAmap (N=16), the local bits permuted (L=8), and the global bits permuted(G=1). While variables L and G are shown, they may or may not be used inthe select logic. Since the PBA2 is a location that has not been swappedsince it is less than the move index (move_index=2 for this example),the select logic effectively determines that PBA2 is not correct andselects PBA1 which it knows to be correct. More specifically, in thefirst condition, the select logic determines that PBA2=4 is less thanN/2=8. Thus, Pba_mc is set to PBA2=4. In the second condition, theselect logic determines that Pba_mc=4 is not less than the move_index=2,and thus sets the output PBA to be PBA1=14.

In one aspect, the first condition can be changed to compare PBA1 to N/2(e.g., PBA1>=N/2).

FIG. 6 illustrates operation of the select logic with example values ofthe PBAs and move index variables where the first condition (e.g.,PBA2<N/2) is satisfied and the second condition (e.g.,PBA_mc<move_index) is satisfied such that the correct PBA is PBA2 orslot 4. The diagram 600 shows the physical block address (PBA) memorymaps at different time stages (e.g., CF0 to CFn). The select logicoperates using the last two memory maps (CFn and CFn-1). Input variablesinclude the move index (move_index=5), the number of entries in the PBAmap (N=16), the local bits permuted (L=8), and the global bits permuted(G=1). While variables L and G are shown, they may or may not be used inthe select logic. Since the PBA2 is a slot that has been swapped sinceit is less than the move index (move_index=5 for this example), theselect logic effectively determines that PBA2 is correct and selects it.More specifically, in the first condition, the select logic determinesthat PBA2=4 is less than N/2=8. Thus, Pba_mc is set to PBA2=4. In thesecond condition, the select logic determines that Pba_mc=4 is less thanthe move_index=5, and thus sets the output PBA to be PBA2=4.

FIG. 7 illustrates operation of the select logic with example values ofthe PBAs and move index variables where the first condition (e.g.,PBA2<N/2) is not satisfied and the second condition (e.g.,PBA_mc<move_index) is satisfied such that the correct PBA is PBA1 orslot 5. The diagram 700 shows the physical block address (PBA) memorymaps at different time stages (e.g., CF0 to CFn). The select logicoperates using the last two memory maps (CFn and CFn-1). Input variablesinclude the move index (move_index=2), the number of entries in the PBAmap (N=16), the local bits permuted (L=8), and the global bits permuted(G=1). While variables L and G are shown, they may or may not be used inthe select logic. Since the PBA2 is a slot (e.g., slot 10) that has notbeen swapped since it is greater than the move index (move_index=2 forthis example), the select logic effectively determines that PBA2 is notcorrect and selects PBA1 which it knows to be correct. Morespecifically, in the first condition, the select logic determines thatPBA2=10 is not less than N/2=8. Thus, Pba_mc is set to PBA1=5. In thesecond condition, the select logic determines that Pba_mc=5 is not lessthan the move_index=2, and thus sets the output PBA to be PBA1=5.

FIG. 8 illustrates operation of the select logic with example values ofthe PBAs and move index variables where the first condition (e.g.,PBA2<N/2) is not satisfied and the second condition (e.g.,PBA_mc<move_index) is not satisfied such that the correct PBA is PBA2 orslot 10. The diagram 800 shows the physical block address (PBA) memorymaps at different time stages (e.g., CF0 to CFn). The select logicoperates using the last two memory maps (CFn and CFn-1). Input variablesinclude the move index (move_index=6), the number of entries in the PBAmap (N=16), the local bits permuted (L=8), and the global bits permuted(G=1). While variables L and G are shown, they may or may not be used inthe select logic. Since the PBA2 is a slot (e.g., slot 10) that has beenswapped since PBA1 was swapped to PBA2 (move index=6 is greater thanPBA1=5), the select logic effectively determines that PBA2 is correctand selects it. More specifically, in the first condition, the selectlogic determines that PBA2=10 is not less than N/2=8. Thus, Pba_mc isset to PBA1=5. In the second condition, the select logic determines thatPba_mc=5 is less than the move_index=6, and thus sets the output PBA tobe PBA2=10.

FIG. 9 is a block diagram of a cumulative state computation block 900including a bitonic network 902 and a bitonic sorter 904 that can beused in the address mapping system of FIG. 2 in accordance with oneembodiment of the disclosure. The cumulative state computation block 900further includes an cumulative mapping block 906 that maygenerate/perform some initial mapping and receives the next output ofthe bitonic network 902 via feedback. The bitonic network 902, a timevarying network which can also be a master bitonic network in thissystem, receives the output of the cumulative mapping block 906 and thecontrol state (cs) and generates a new cumulative mapping. The bitonicsorter 904 receives the new cumulative mapping and determines the switchsettings (e.g., cumulative control states or CCS2) needed to go from theinitial cumulative mapping to the new cumulative mapping.

In one aspect, at any given time, the system may store the last twovalues for CCS (for access determination in the hardware or accessnetwork) and the current values for CS (for moving). So in one examplethe control state memory is only about 960 bits (e.g., 320×3 bits). Insuch case, a global mapping bit for these three mappings (i.e., 3 morebits) may need to be preserved.

As to the use of a bitonic network as compared with a Benes network(described above in discussion of FIG. 3), the bitonic network can havelog 2(L/2)*(log 2(L/2)+1)/2*L/2 switches, while the Benes network canhave 2*log 2(L/2)*L/2 switches. For example values of L=32 such thatL/2=16, the Benes network can have 8 (=2*log 2(16)) stages of switcheswhere each stage consists of 16 (=L/2) switches. In such case, thebitonic network has 20 (=4*(4+1)/2(=log 2(16)*(log 2(16)+1)/2) stages ofswitches where each stage consists of 16 (=L/2) switches. So the bitonicnetwork may need to be pipelined more to achieve one address look up fora cycle. So the number of 2 by 2 switches needed may thus be 320 versus128 for the Benes network, which is still small. In one aspect, eachswitch has two 1-bit multiplexers and each switch needs 3 gates (2 ANDgates and 1 OR gate). So it appears that about 2000 gates versus about700 gates (exact calculation is 320×6 gates versus 128×6 gates) may beused to implement each network. In one aspect, this may result in 4000gates for the bitonic network versus 1400 gates for Benes. However, thefirmware may be much simpler for the bitonic network.

Aspects of the bitonic sorter and bitonic network will be described ingreater detail below.

FIG. 10 is a diagram of a bitonic sorter 1000 including a sorter table1002 and comparison type table 1004 in accordance with one embodiment ofthe disclosure. A bitonic sorter can have log 2(L/2)*(log2(L/2)+1)/2*L/2 comparators. For an example, say L=8, and thus L/2=4. Insuch case, the bitonic sorter can have six stages of comparators, wherelog 2(8)*(log 2(8)+1)/2=3*(3+1)/2=6, and each stage consists of 4 (=L/2)comparators.

The comparison type table 1004, or “cmp_type”, is a matrix of a sizewith the number of rows equal to log 2(L/2)*(log 2(L/2)+1)/2 (e.g.,equal to number of stages of comparators=6) and the number of columnsequal to L/2 (e.g., equal to number of comparators in each stage=4). Sofor L=8, as in the working example, cmp_type 1004 is a matrix of size6×4. The first row (or in general ith row) in this cmp_type matrix 1004corresponds to a comparator type of the first stage of comparators (orin general ith stage of comparators) in diagram 1000. The comparatortype 0 (e.g., row 1, column 1 of cmp_type 1004) means a comparatortaking two inputs (in1, in2) and presenting the outputs (out1,out2) suchthat first output is the smaller number among the inputs (e.g.,out1=minimum(in1,in2)) and second input is the larger number among theinputs (e.g., out2=maximum(in1,in2)). This is shown with the down arrowin diagram 1000. In one aspect, the comparator also gives an output bitthat is equal to 1 if a swap occurred (e.g., out1=in2, out2=in1), to 0if no swap occurred (e.g., out1=in1 and out2=in2). This aspect is notshown in diagram 1000.

The comparator type 1 (e.g., row 1, column 2 of cmp_type 1004) means acomparator taking two inputs (in1, in2) and presenting the outputs(out1, out2) such that first output is the larger number among theinputs (e.g., out1=maximum(in1,in2)) and second input is the smallernumber among the inputs (e.g., out2=minimum(in1,in2)). This is shownwith the upward arrow in diagram 1000. In one aspect, the comparatoralso gives an output bit that is equal to 1 if a swap occurred (e.g.,out1=in2, out2=in1), to 0 if no swap occurred (e.g., out1=in1,out2=in2). This aspect is not shown in diagram 1000.

The sorter table 1002, “sorter_ind”, is a matrix of a size with a numberof rows equal to log 2(L/2)*(log 2(L/2)+1)/2 (e.g., equal to number ofstages of comparators or 6) and a number of columns equal to L (e.g.,equal to number of inputs to each stage of comparators or 8). So forL=8, as in the working example, the sorter_ind 1002 is a matrix of size6×8. The first row (or in general ith row) in this sorter_ind matrix1002 corresponds to the port numbers that are connected to the inputs ofeach stage of bitonic network.

In one aspect, a sequence can be bitonic if it monotonically increasesand then monotonically decreases, or if it can be circularly shifted tomonotonically increase and then monotonically decrease.

In one aspect, a bitonic network can have the same topology as that ofthe bitonic sorter 1000 except that that comparators are replaced with 2by 2 switches with control inputs.

FIG. 11 is another block diagram of a system 1100 for local addressmapping including an access network 1102 and one or more read-onlymemories (ROMs) (1104 a, 1104 b, 1104 c) for storing pre-calculatedcumulative control state values in accordance with one embodiment of thedisclosure. The system 1100 further includes a background swap scheduler1108 and a mapping state generation and change block 1110. In oneaspect, the access network 1102 and ROMs (1104 a, 1104 b, 1104 c) can beimplemented in hardware (e.g., ultra-low latency with 3 cycle pipelinedelay with low logic and memory equivalent of less than 10,000 logicgates) and the remaining components of the system 1100 can beimplemented in firmware. In operation, the blocks of system 1100 canoperate similar to those of system 200 of FIG. 2. A primary differencehowever in system 1100 is that the cumulative state is computed offlineusing a master bitonic network, or other suitable network, and thenstored (e.g., in a table) in the ROMs (1104 a, 1104 b, 1104 c). In oneaspect, this approach can involve using a small amount of additionalmemory as compared to the system of FIG. 2.

Block 1104 a represents a non-volatile memory (e.g., ROM such asCCS_ROM) storing the CCS values (e.g., CCS1 and CCS2). Block 1104 brepresents a non-volatile memory (e.g., ROM such as CS_ROM) storing theCS values (e.g., cs1 and cs2). Block 1104 c represents a non-volatilememory (e.g., programmable ROM such as USE_PROM) effectively storingwhich lines in the CS_ROM and CCS_ROM are being used in case there is aloss of power. Effectively, the USE_PROM can be used to preserve thecontrol state in a non-volatile memory space to restore in case of powerloss. The control state values stored can include MOVE_INDEX, cs2, ccs1,ccs2, bg_transfer_address_1, bg_transfer_address2, bg_transfer_status,and/or ROM_row_index. In one aspect and upon recovery of power, thesystem 1100 can perform a consistency check using the USE_PROM (e.g.,use indicator) entries and control state and restore the mapping stateand resume any interrupted background transfers.

FIGS. 12a, 12b, 12c are schematic diagrams of ROMs for storing controlstate values, cumulative control state values, and use indicators thatcan be used in the system of FIG. 11 in accordance with one embodimentof the disclosure.

FIG. 12a is a schematic diagram of a ROM (CS_ROM) 1200 that can be usedto store control state (CS) values used in the system of FIG. 11 inaccordance with one embodiment of the disclosure. FIG. 12a illustratesone possible implementation of a non-volatile memory that can be used tostore control state values. In another aspect, other implementations canalso be used.

FIG. 12b is a schematic diagram of a ROM (CCS_ROM) 1202 that can be usedto store cumulative control state (CCS) values used in the system ofFIG. 11 in accordance with one embodiment of the disclosure. FIG. 12billustrates one possible implementation of a non-volatile memory thatcan be used to store cumulative control state values. In another aspect,other implementations can also be used.

FIG. 12c is a schematic diagram of a PROM (USE_PROM) 1204 that can beused to store control state (CS) values used in the system of FIG. 11 inaccordance with one embodiment of the disclosure. More specifically, theUSE_PROM 1204 can be used to store index or placeholder informationrelating to current positions in the CS_ROM and CCS_ROM in anon-volatile memory space to restore in case of power loss. FIG. 12cillustrates one possible implementation of a non-volatile memory thatcan be used to store index information into the ROMs. In another aspect,other implementations can also be used.

In one aspect, the system 1100 of FIG. 11 can increment a ROM_row_indexby 1 every time a mapping gets used, where ROM_row_index can be theaddress for CS_ROM, and CCS_ROM. The system can also program a 1-bitentry in USE_PROM as 1 to indicate this line is used already.

FIG. 13 is a block diagram of another access network 1300 including aselect logic block 1302 that can be used in the address mapping systemof FIG. 11 in accordance with one embodiment of the disclosure. In oneaspect, the access network 1300 can be used in the system of FIG. 11 asaccess network 1102. The system 1300 further includes a first bitonicnetwork 1304 and a second bitonic network 1306. The system 1300 canoperate substantially the same as system 300 of FIG. 3 except that thecumulative control state values (CCS1, CCS2) are received from the ROMs(e.g., 1104 a, 1104 b, 1104 c) rather than from a online cumulativecontrol state block such as block 204 of FIG. 2.

The systems and methods for performing local address mapping describedabove may be used in conjunction with wear leveling schemes employingrandom address mapping using local and global interleaving. Thefollowing section describes such approaches.

Local/Global Interleaving

FIG. 14 is a block diagram of indirection table 1400 in accordance withone embodiment of the disclosure. For example, in a drive with Mpages/sectors, the indirection table has M entries as is depicted inFIG. 14. In such case, each entry is N bits where N is log 2(M). For a 2TB drive with 512 byte pages, M=2×10̂12B/512B=3.9×10̂9 and thus N is equalto 32. As such, the memory required in bits for the table would be M×log2M=125 GB (˜15 GB). The frequency of use table would also consumesimilar space (˜15 GB). So the total requirement would be around 30 GBfor this meta data. In some implementations, the meta data may have tobe replicated with two plus one redundancy, thereby increasing thecomplexity up to 90 GB. In such case, this memory usage amounts toaround 4.5% of disk space. So this sort of approach would generally notbe practical.

FIG. 15 is a block diagram of a general system for performing randomaddress mapping using local and global interleaving in accordance withone embodiment of the disclosure. The system 1500 includes a lookuptable 1502 that can be used to store 2̂G entries with a depth of 2̂G and awidth of G. The system 1500 also includes a multi-stage interconnectionnetwork (MIN) 1504 that can be used to provide permutations of datasets, and a control state block 1506 that can be used to control the MIN1504. The system 1500 illustrates a general framework for mapping anN-bit logical address space to N-bit physical space by first dividingthe address bits into G bits and N−G bits. In general, any G bits out ofthe N bits can be selected using another fixed network. In this context,a fixed network can simply be a fixed arrangement of wires to arrive ata specific network. As compared to a multi-stage programmableinterconnection network, the fixed network may not have programmability.For simplicity, the G bits selected are the most significant bits (MSBs)of the N bits. So the system can perform mapping on 2̂G entries in block1502, and perform bit permutation on N−G bits in block 1504. The G bitscan be mapped using a 2̂G entry mapping table 1502. In one aspect, themapping can be performed such that there is one-to-one unique mappingand the input is not equal to the output. Also, in one aspect, G isselected such that 1<=G<=N. In one aspect, the case of G<=6 may be ofparticular interest. If G=N, then this case can be equivalent to theconventional mapping table approach.

In one embodiment, the global mapping can satisfy one or moreproperties. For example, in one aspect, the global mapping can be a oneto one function. In another aspect, the global mapping can be performedsuch that the input is not equal to the output. In another aspect, aswap can be performed such that a global mapping of a number (k) isequal to kk, while a global mapping of kk is equal to k. So suitablefunctions for global mapping may include bit inverse mapping, randomswap, deterministic swap, and other suitable functions. Bit inversemapping can be chosen for a simple hardware implementation. If a tableis used, the maximum size of the table needed can be 2̂G entries witheach entry having a width of G bits. Since G is not more than 7 in thisexample, the table approach is also suitable.

In one embodiment, the local mapping can satisfy one or more properties.For example, in one aspect, the local mapping can be a one to onefunction. So suitable functions for local mapping may includedeterministic mapping and/or random mapping. In one aspect, randommapping may be selected. Deterministic or random mapping may beimplemented using tables or an Omega network, a Butterfly network, aBenes network, or another suitable network. In one aspect, a Benesnetwork (e.g., such as a master-slave Benes network) is selected as ithas the lowest complexity for computing the switch state required. Inthis network, a bitonic sorting can be implemented on master Benesnetwork on sequences with certain properties to derive the switch statefor slave Benes network. In one embodiment, the local address mappingcan be performed using any of the local address mapping schemesdescribed above in conjunctions with FIGS. 1-13.

In one embodiment, a wear leveling algorithm implemented with the randomaddress mapping can involve operating in an address space, setpartitioning the address space, and local and global interleaving in theaddress space. In one aspect, the wear leveling algorithm can involvegradual deterministic transition from one memory map to another memorymap.

FIG. 16 is a flow chart of a process for performing random addressmapping using global mapping and local interleaving in accordance withone embodiment of the disclosure. In one embodiment, the process can beused for wear leveling or other random address mapping in any of therandom mapping systems described herein. In block 1602, the processidentifies a number of bits (N) in a physical address space of anon-volatile memory (NVM). In block 1604, the process selects at leastone bit (G) of the N bits of the physical address space to be used forglobal interleaving, where G is less than N. In block 1606, the processdetermines a number of bits equal to N minus G (N−G) to be used forlocal interleaving.

In block 1608, the process maps the G bit(s) using a mapping functionfor global interleaving. In one embodiment, the mapping function can bea bit inverse mapping function, a random swap mapping function, adeterministic swap mapping function, and/or another suitable mappingfunction.

In block 1610, the process interleaves (N−G) bits using an interleavingfunction for local interleaving. In one embodiment, the interleavingfunction can be a deterministic interleaving function, a randominterleaving function, and/or another suitable interleaving function. Inone embodiment, the interleaving function can be implemented using anOmega network, a Butterfly network, a Benes network, a master-slaveBenes network, and/or another suitable interleaving function.

In some embodiments, the mapping function for the global interleaving isa bit inverse mapping function, and the interleaving function isimplemented using a master-slave Benes network. In one such embodiment,the G bit(s) are the most significant bit(s) of the physical addressspace of the NVM, and the bit inverse mapping function involvesinversing each of the G bit(s).

In block 1612, the process generates a combined mapping including themapped G bit(s) and the interleaved (N−G) bits. In one embodiment, thecombined mapping constitutes a mapped physical address (see for examplecol. 806 in FIG. 8 as will be discussed in more detail below).

FIG. 17 is a block diagram of a system for performing random addressmapping with bit inverse for global mapping (G bits) and permutation forlocal interleaving (N−G bits) in accordance with one embodiment of thedisclosure. The system 1700 includes a bit inverse block 1702 that canbe used to inverse selected bits of the logical address. In one aspect,for example, the bit inverse block 1702 can be used to map G bits usinga mapping function for global interleaving as is described in block 1608of FIG. 16, where the mapping function is a bit inversing function. Thesystem 1700 also includes a multi-stage interconnection network (MIN)1704 that can be used to provide permutations of data sets, such aspermutations of selected bits of the logical address. In one aspect, theMIN 1704 can be used to interleave N−G bits using an interleavingfunction for local interleaving as is described in block 1610 of FIG.16. The system 1700 also includes a control state block 1706 that can beused to control the MIN 1704.

The system 1700 further includes a processor 1708 which can be used tocontrol and/or perform computations for the bit inverse block 1702 andthe MIN 1704. In this context, processor 1708 refers to any machine orselection of logic that is capable of executing a sequence ofinstructions and should be taken to include, but not limited to, generalpurpose microprocessors, special purpose microprocessors, centralprocessing units (CPUs), digital signal processors (DSPs), applicationspecific integrated circuits (ASICs), signal processors,microcontrollers, and other suitable circuitry. Further, it should beappreciated that the term processor, microprocessor, circuitry,controller, and other such terms, refer to any type of logic orcircuitry capable of executing logic, commands, instructions, software,firmware, functionality, or other such information. In one aspect, theprocessor 1708 can be used to identify a number of bits (N) in aphysical address space of a non-volatile memory (NVM) as is described inblock 1602 of FIG. 16, select at least one bit (G) of the N bits of thephysical address space to be used for global interleaving, where G isless than N as is described in block 1604 of FIG. 16, and/or determine anumber of bits equal to N minus G (N−G) to be used for localinterleaving as is described in block 1606 of FIG. 16. In one aspect,the processor 1708 can also be used to generate a combined mappingincluding the mapped G bit(s) and the interleaved (N−G) bits as isdescribed in block 1612 of FIG. 16. In one embodiment, the combinedmapping is instead generated by block 1702 and/or block 1706.

In one simple example to illustrate the address space operations, and asdepicted in FIG. 17, assume the number of pages (M) in the NVM is 16(i.e., M=16 pages). In such case, the number of address bits (N) can becomputed as N=log 2(M)=4 address bits. In such case, the parameters ofthe configuration would be as follows: G=1 (2̂G partitions), L=N−G=4−1=3(3×3 network). This simple example will be carried through FIGS. 18 to20.

FIG. 18 is a table 1800 illustrating an example of global mapping usingbit inverse on G bits in accordance with one embodiment of thedisclosure. In one aspect, the table 1800 of FIG. 18 can be viewed as anexample of the global mapping shown in block 1702 of FIG. 17. In thecontinuing simple example, G is 1 bit (i.e., the most significant bit(MSB) of the 4 address bits). In the example of FIG. 18, the table 1800illustrates the initial addresses in the left column, shown in bothdecimal and binary. The table 1800 also illustrates the final addresses,after global mapping using bit inverse on the G bits (i.e., the MSB), inthe right column of addresses, shown in both decimal and binary. As canbe seen in FIG. 18, the global mapping using bit inverse is a one to onefunction, and the input is not equal to the output. This implementationis consistent with one or more of the possible design characteristicsdiscussed above.

FIG. 19 is a table 1900 illustrating an example of local interleavingusing a permutation on N−G bits in accordance with one embodiment of thedisclosure. More specifically, for the local interleaving of addressbits, assume the 3 address bits ([x2 x1 x0]) are permuted to [x2 x0 x1].In the example of FIG. 19, the table 1900 illustrates the initialaddresses in the left column, shown in both decimal and binary. Thetable 1900 also illustrates the final addresses, after local mappingusing the selected permutation, in the right column of addresses, shownin both decimal and binary. As can be seen in FIG. 19, the localinterleaving using permutation is a one to one function. Thisimplementation is consistent with one or more of the possible designcharacteristics discussed above. In one aspect, the table 1900 of FIG.19 can be viewed as an example of the local interleaving as shown inblock 1704 of FIG. 17.

FIG. 20 is a table 2000 illustrating an example of global mapping usingbit inverse and local interleaving using permutation in accordance withone embodiment of the disclosure. The left most column 2002 shows theoriginal addresses in decimal. The middle column 2004 shows the effectof global mapping/interleaving only and matches the final column (e.g.,results) of FIG. 18. The right most column 2006 shows the resultingphysical addresses with both the global mapping using bit inverse andthe local interleaving using a selected permutation. This simple exampleillustrates one possible operation of the systems and methods of FIGS.15-17. More specifically, the table 2000 of FIG. 20 can be viewed as anexample of the combined mapping that can be generated by any combinationof the processor 1708, block 1702 and 1704 of FIG. 17.

FIG. 21 is a block diagram of a multi-stage interconnection network(MIN) 2100 that can be used to perform local interleaving (e.g., block1704 in FIG. 17) in accordance with one embodiment of the disclosure.This MIN approach (e.g., multi-stage interconnection network or MIN with2̂N entries) for generating random mapping from logical space andphysical space is may be expensive to implement as the storage size canbe large.

More specifically, in one aspect, moving items has to be done based on acertain order defined by mapping. For a read process, to differentiatewhich chip select (CS) has to be used, another table of 2̂N entries andeach entry width needs to be maintained. In contrast, the CS chipstorage is equal to log 2(N)*N/2 for an Omega network and log 2(N)*N fora Benes network.

FIG. 22 is a block diagram of a butterfly MIN 2200 that can be used toperform local interleaving in accordance with one embodiment of thedisclosure. This MIN approach (e.g., butterfly MIN on 2̂N entries) forgenerating random mapping from logical space and physical space is asuitable multi-stage interconnection network that may be used, forexample, for the MIN 1704 of FIG. 17 or the MIN 1504 of FIG. 15.

For the trivial case of shuffle equal to 1 for the physical addressspace, the network is not needed as it is easy to figure out themapping. In this context, an address shuffle can be defined as a leftcyclic shift of the physical address, which is a binary string. Considerfor example stages 1 to M. At stage k, the physical address of a logicaladdress is given by (xn-1, xn-2, xn-3, xn-k, . . . , x1, x0) isconverted to (via inverse) (Xn-1, Xn-2, Xn-3, Xn-k-1, . . . x1, x0). Inone aspect, another simpler case may include a butterfly permutationwhere the MSB is swapped with the LSB, a substitution permutation whereany ith bit is swapped with bit 0 (e.g., the LSB), and a superpermutation where any ith bit is swapped with the MSB. In anotheraspect, the local interleaving may involve using any switch combinationfor each stage.

In general a MIN may be used is one of two modes. For example, in arouting mode, the switches in MIN are configured to realize the desiredmapping from input ports to output ports in one or more passes. In suchcase, each input port takes a multi-bit (say m-bit) word and each outputport gives a m-bit word, and there are N inputs and N outputs. In asecond mode, an interleaving mode, the switches in MIN are configuredusing a random seed. This results in a random mapping from input portsto output ports in a single pass. In several aspects, the interleaversand/or interleaving described herein can use a MIN in the interleavingmode to interleave preselected bits in a desired manner.

FIG. 23 is a block diagram of a Benes MIN 2300 that can be used toperform local interleaving in accordance with one embodiment of thedisclosure. This MIN approach (e.g., Benes MIN on 2̂N entries) forgenerating random mapping from logical space and physical space is asuitable multi-stage interconnection network that may be used, forexample, for the MIN 1704 of FIG. 17 or the MIN 1504 of FIG. 15.

FIG. 24 is a block diagram of a Omega MIN 2400 that can be used toperform local interleaving in accordance with one embodiment of thedisclosure. This MIN approach (e.g., Omega MIN on 2̂N entries) forgenerating random mapping from logical space and physical space is asuitable multi-stage interconnection network that may be used, forexample, for the MIN 1704 of FIG. 17 or the MIN 1504 of FIG. 15. In oneaspect, the Omega network may only be able to provide a subset of allpossible permutations of switching while the Benes network may be ableprovide all possible permutations. In one aspect, if a desiredpermutation is required, it may be difficult to solve chip selectsettings for the Benes network. To counter this potential issue, oneimplementation of the Benes network involves randomly setting the chipselect settings, which can makes the chip select algorithm much simpler.That is, the randomly generated chip select settings reduce computingtime requirements and/or computing challenges needed to solve the chipselect settings.

FIG. 25 shows a block diagram of a modified (8×8) Omega MIN 2500 thatcan be used to perform local interleaving in accordance with oneembodiment of the disclosure. In general, Omega networks are (N×N)multistage interconnection networks that are sized according to integerpowers of two. Thus, Omega networks have sizes of N=2, 4, 8, 16, 32, 64,128, etc. Further, the number L of stages in an Omega network is equalto log 2(N) and the number of (2×2) switches per stage is equal to N/2.

Omega network 2500 is an (8×8) network that receives eight input valuesat eight input terminals A[0:7] and maps the eight input values to eightoutput terminals B[0:7]. Each input value may be any suitable value suchas a single bit, a plurality of bits, a sample, or a soft value (such asa Viterbi log-likelihood ratio (LLR) value) having a hard-decision bitand at least one confidence-value bit. The eight input values are mappedto the eight output terminals using log 2(8)=3 configurable stages i,where i=1, 2, 3, each of which comprises 8/2=4 (2×2) switches.

Each stage i receives the eight input values from the previous stage, orfrom input terminals A[0:7] in the case of stage 1, via a fixedinterconnection system (e.g., 2502, 2504, and 2506) that implements aperfect shuffle on the eight input values. A perfect shuffle is aprocess equivalent to (i) dividing a deck of cards into two equal piles,and (ii) shuffling the two equal piles together in alternating fashionsuch that the cards in the first pile alternate with the cards from thesecond pile.

For example, stage 1 receives eight inputs values from input terminalsA[0:7] via fixed interconnection system 2502. Fixed interconnectionsystem 2502 performs a perfect shuffle on the eight input values bydividing the eight input values received at input terminals A[0:7] intoa first set corresponding to input terminals A[0:3] and a second setcorresponding to input terminals A[4:7]. Similarly, fixedinterconnection system 2504 performs a perfect shuffle on the outputs ofswitches from stage 1 and provides the shuffled outputs to the switchesof stage 2, and fixed interconnection system 2506 performs a perfectshuffle on the outputs of the switches of stage 2 and provides theshuffled outputs to the switches of stage 3.

In addition to receiving eight input values, each configurable stage ireceives a four-bit control signal Ci[0:3] from control signal memory(e.g., ROM), wherein each bit of the four-bit control signal configuresa different one of the four 2×2 switches in the stage. Thus, theswitches of stage 1 are configured based on the values of control bitsC1[0], C1[1], C1[2], and C1[3], the switches of stage 2 are configuredbased on the values of control bits C2[0], C2[1], C2[2], and C2[3], andthe switches of stage 3 are configured based on the values of controlbits C3[0], C3[1], C3[2], and C3[3].

Setting a control bit to a value of one configures the correspondingswitch as a crossed connection such that (i) the value received at theupper input is provided to the lower output and (ii) the value receivedat the lower input is provided to the upper output. Setting a controlbit to a value of zero configures the corresponding switch as a straightpass-through connection such that (i) the value received at the upperinput is provided to the upper output and (ii) the value received at thelower input is provided to the lower output.

In signal-processing applications, multistage interconnection networks,such as Omega network 2500, are often used for routing purposes toconnect processors on one end of the network to memory elements on theother end. However, multistage interconnection networks may also be usedin signal-processing applications for other purposes, such as forpermutating or interleaving a contiguous data stream.

FIG. 25 illustrates one implementation of a suitable Omega MINconfigured for interleaving. In other embodiments, other implementationsof a suitable Omega MIN can be used.

While the above description contains many specific embodiments of theinvention, these should not be construed as limitations on the scope ofthe invention, but rather as examples of specific embodiments thereof.Accordingly, the scope of the invention should be determined not by theembodiments illustrated, but by the appended claims and theirequivalents.

The various features and processes described above may be usedindependently of one another, or may be combined in various ways. Allpossible combinations and sub-combinations are intended to fall withinthe scope of this disclosure. In addition, certain method, event, stateor process blocks may be omitted in some implementations. The methodsand processes described herein are also not limited to any particularsequence, and the blocks or states relating thereto can be performed inother sequences that are appropriate. For example, described tasks orevents may be performed in an order other than that specificallydisclosed, or multiple may be combined in a single block or state. Theexample tasks or events may be performed in serial, in parallel, or insome other suitable manner. Tasks or events may be added to or removedfrom the disclosed example embodiments. The example systems andcomponents described herein may be configured differently thandescribed. For example, elements may be added to, removed from, orrearranged compared to the disclosed example embodiments.

What is claimed is:
 1. A method for determining a physical block address(PBA) of a non-volatile memory (NVM) to enable a data access of acorresponding logical block address (LBA), the method comprising:generating a first physical block address (PBA) candidate from a LBAusing a first function; generating a second physical block address (PBA)candidate from the LBA using a second function; and selecting either thefirst PBA candidate or the second PBA candidate for the data accessbased on information related to a background swap of data stored at thefirst PBA candidate and a background swap of data stored at the secondPBA candidate.
 2. The method of claim 1, further comprising accessingdata stored at the selected PBA candidate via the LBA.
 3. The method ofclaim 1, wherein the data access is one of a read access or a writeaccess.
 4. The method of claim 1, wherein the information related to thebackground swap of data stored at the first PBA candidate and thebackground swap of data stored at the second PBA candidate comprises astatus of the background swap of data stored at the first PBA candidateand a status of the background swap of data stored at the second PBAcandidate.
 5. The method of claim 1, further comprising: mapping aportion of a physical address space containing the selected PBAcandidate to another portion of the physical address space using atleast one of a background data move or a background data swap.
 6. Themethod of claim 1: wherein the selecting either the first PBA candidateor the second PBA candidate comprises selecting either the first PBAcandidate or the second PBA candidate using a memory table.
 7. Themethod of claim 1, wherein at least one of the first function or thesecond function comprises a function performed by at least one of amulti-stage interconnection network or a block cipher.
 8. The method ofclaim 7, wherein the multi-stage interconnection network comprises atleast one of a Benes network, an inverse Benes network, a Bitonicnetwork, an inverse Bitonic network, an Omega network, an inverse Omeganetwork, a Butterfly network, or an inverse Butterfly network.
 9. Themethod of claim 1: wherein the generating the first PBA candidate fromthe LBA using the first function comprises generating the first PBAcandidate within a physical block address (PBA) map from the LBA usingthe first function and a first cumulative control state; wherein thegenerating the second PBA candidate from the LBA using the secondfunction comprises generating the second PBA candidate within the PBAmap from the LBA using the second function and a second cumulativecontrol state; wherein the selecting either the first PBA candidate orthe second PBA candidate for data access comprises: determining aposition of the second PBA candidate relative to a midpoint of allentries in the PBA map; determining a PBA move counter based on theposition of the second PBA candidate; comparing the PBA move counter toa move index indicative of a current position of PBA swaps within thePBA map; and selecting either the first PBA candidate or the second PBAcandidate based on the comparison of the PBA move counter and the moveindex.
 10. The method of claim 9, wherein the determining the PBA movecounter based on the position of the second PBA candidate comprisesassigning the PBA move counter to the value of the second PBA candidate.11. The method of claim 9, wherein the determining the PBA move counterbased on the position of the second PBA candidate comprises assigningthe PBA move counter to the value of the first PBA candidate.
 12. Themethod of claim 9, wherein the selecting either the first PBA candidateor the second PBA candidate based on the comparison of the PBA movecounter and the move index comprises selecting the first PBA candidate.13. The method of claim 9, wherein the selecting either the first PBAcandidate or the second PBA candidate based on the comparison of the PBAmove counter and the move index comprises selecting the second PBAcandidate.
 14. The method of claim 1: wherein the generating the firstPBA candidate from the LBA using the first function comprises generatingthe first PBA candidate within a physical block address (PBA) map fromthe LBA using the first function and a first cumulative control state;wherein the generating the second PBA candidate from the LBA using thesecond function comprises generating the second PBA candidate within thePBA map from the LBA using the second function and a second cumulativecontrol state; wherein the selecting either the first PBA candidate orthe second PBA candidate comprises: determining a position of the firstPBA candidate relative to a midpoint of all entries in the PBA map;determining a PBA move counter based on the position of the first PBAcandidate; comparing the PBA move counter to a move index indicative ofa current position of PBA swaps within the PBA map; and selecting eitherthe first PBA candidate or the second PBA candidate based on thecomparison of the PBA move counter and the move index.
 15. A system fordetermining a physical block address (PBA) of a non-volatile memory(NVM) to enable a data access of a corresponding logical block address(LBA), the system comprising: a first network configured to generate afirst PBA candidate from a LBA using a first function; a second networkconfigured to generate a second PBA candidate from the LBA using asecond function; and a select logic configured to select either thefirst PBA candidate or the second PBA candidate for the data accessbased on information related to a background swap of data stored at thefirst PBA candidate and a background swap of data stored at the secondPBA candidate.
 16. The system of claim 15, further comprising aprocessor configured to access data stored at the selected PBA candidatevia the LBA.
 17. The system of claim 15, wherein the data access is oneof a read access or a write access.
 18. The system of claim 15, whereinthe information related to the background swap of data stored at thefirst PBA candidate and the background swap of data stored at the secondPBA candidate comprises a status of the background swap of data storedat the first PBA candidate and a status of the background swap of datastored at the second PBA candidate.
 19. The system of claim 18: whereinthe first PBA candidate and the second PBA candidate are within a PBAmap; and wherein the status of the background swap of data stored at thesecond PBA candidate comprises a position of the second PBA candidaterelative to a midpoint of all entries in the PBA map, a PBA move counterbased on the position of the second PBA candidate, and a move indexindicative of a current position of PBA swaps within the PBA map. 20.The system of claim 15, further comprising a mapper configured to map aportion of a physical address space containing the selected PBA toanother portion of the physical address space using at least one of abackground data move or a background data swap.
 21. The system of claim15, where the select logic is configured to select the first PBA or thesecond PBA using a memory table.
 22. The system of claim 15, wherein thefirst network includes at least one of a multi-stage interconnectionnetwork or a block cipher network.
 23. The system of claim 22, whereinthe multi-stage interconnection network comprises at least one of aBenes network, an inverse Benes network, a Bitonic network, an inverseBitonic network, an Omega network, an inverse Omega network, a Butterflynetwork, or an inverse Butterfly network.
 24. The system of claim 15:wherein the first network is configured to generate the first PBAcandidate within a physical block address (PBA) map from the LBA usingthe first function and a first cumulative control state; wherein thesecond network is configured to generate the second PBA candidate withinthe PBA map from the LBA using the second function and a secondcumulative control state; wherein the select logic is configured to:determine a position of the second PBA candidate relative to a midpointof all entries in the PBA map; determine a PBA move counter based on theposition of the second PBA candidate; compare the PBA move counter to amove index indicative of a current position of PBA swaps within the PBAmap; and select either the first PBA candidate or the second PBAcandidate based on the comparison of the PBA move counter and the moveindex.
 25. The system of claim 24, wherein the select logic isconfigured to assign the PBA move counter to the value of the second PBAcandidate.
 26. The system of claim 24, wherein the select logic isconfigured to assign the PBA move counter to the value of the first PBAcandidate.
 27. The system of claim 24, wherein the select logic isconfigured to select the first PBA candidate.
 28. The system of claim24, wherein the select logic is configured to select the second PBAcandidate.
 29. The system of claim 15: wherein the first network isconfigured to generate the first PBA candidate within a physical blockaddress (PBA) map from the LBA using the first function and a firstcumulative control state; wherein the second network is configured togenerate the second PBA candidate within the PBA map from the LBA usingthe second function and a second cumulative control state; wherein theselect logic is configured to: determine a position of the first PBAcandidate relative to a midpoint of all entries in the PBA map;determine a PBA move counter based on the position of the first PBAcandidate; compare the PBA move counter to a move index indicative of acurrent position of PBA swaps within the PBA map; and select either thefirst PBA candidate or the second PBA candidate based on the comparisonof the PBA move counter and the move index.
 30. A system for determininga physical block address (PBA) of a non-volatile memory (NVM) to enablea data access of a corresponding logical block address (LBA), the systemcomprising: means for generating a first PBA candidate from a LBA usinga first function; means for generating a second PBA candidate from theLBA using a second function; and means for selecting either the firstPBA candidate or the second PBA candidate for the data access based oninformation related to a background swap of data stored at the first PBAcandidate and a background swap of data stored at the second PBAcandidate.
 31. The system of claim 30: wherein the means for generatingthe first PBA candidate from the LBA using the first function comprisesmeans for generating the first PBA candidate within a physical blockaddress (PBA) map from the LBA using the first function and a firstcumulative control state; wherein the means for generating the secondPBA candidate from the LBA using the second function comprises means forgenerating the second PBA candidate within the PBA map from the LBAusing the second function and a second cumulative control state; whereinthe means for selecting either the first PBA candidate or the second PBAcandidate for data access comprises: means for determining a position ofthe second PBA candidate relative to a midpoint of all entries in thePBA map; means for determining a PBA move counter based on the positionof the second PBA candidate; means for comparing the PBA move counter toa move index indicative of a current position of PBA swaps within thePBA map; and means for selecting either the first PBA candidate or thesecond PBA candidate based on the comparison of the PBA move counter andthe move index.
 32. The method of claim 30: wherein the means forgenerating the first PBA candidate from the LBA using the first functioncomprises means for generating the first PBA candidate within a physicalblock address (PBA) map from the LBA using the first function and afirst cumulative control state; wherein the means for generating thesecond PBA candidate from the LBA using the second function comprisesmeans for generating the second PBA candidate within the PBA map fromthe LBA using the second function and a second cumulative control state;wherein the means for selecting either the first PBA candidate or thesecond PBA candidate comprises: means for determining a position of thefirst PBA candidate relative to a midpoint of all entries in the PBAmap; means for determining a PBA move counter based on the position ofthe first PBA candidate; means for comparing the PBA move counter to amove index indicative of a current position of PBA swaps within the PBAmap; and means for selecting either the first PBA candidate or thesecond PBA candidate based on the comparison of the PBA move counter andthe move index.